A Data Analytic Framework for Unstructured Text Hassanin

نویسنده

  • Fathy E. Eassa
چکیده

This paper describes a systematic flow of the unstructured data in industry, collected data, stored data, and the amount of data. Big data uses salable storage index and distributed approach to retrieve required information. Therefore, the paper introduces an unstructured data framework for managing and discovering using the 3Vs of big data: variety, velocity, and volume. Different approaches for managing, collecting, and classification of twitter data, e-mail data and free text are required to manage resources more efficiently, and building software platform around scalable analytics. The development processes in this paper is implemented in Python, build up lexicon and calculated sentiment score. Analyzing twitter data and e-mail data answered many of questions; what are people talking about?, what is the most important? ... etc. The accuracy of the proposed classifier was 77.78, without stop words and was 78.76 and 79.94 with stop words (25 and 174) respectively. If the stop words are increased, the accuracy will be 87.69.It has been 10% better accuracy between Naïve Bayes and Maximum Entropy classifier. [Hassanin M. Al-Barhamtoshy and Fathy E. Eassa. A Data Analytic Framework for Unstructured Text. Life Sci. J 2014; 11(10):339-350] (ISSN: 1097-8135). http://www.lifesciencesite.com. 48

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic OCR Segmented - based System Hassanin

A new investigation in the Arabic OCR system has presented for the offline recognition of machineprinted cursive words. Therefore, a reliable transformation mechanism will be used to transform image text into free text (ASCII or Unicode Texts), that can be directly searched by a computer. Therefore, traditional preprocessing model (segmentation phase) will be included to extract each word from ...

متن کامل

Assessing the Quality of Unstructured Data: An Initial Overview

In contrast to structured data, unstructured data such as texts, speech, videos and pictures do not come with a data model that enables a computer to use them directly. Nowadays, computers can interpret the knowledge encoded in unstructured data using methods from text analytics, image recognition and speech recognition. Therefore, unstructured data are used increasingly in decision-making proc...

متن کامل

Nonparametric Regression Estimation under Kernel Polynomial Model for Unstructured Data

The nonparametric estimation(NE) of kernel polynomial regression (KPR) model is a powerful tool to visually depict the effect of covariates on response variable, when there exist unstructured and heterogeneous data. In this paper we introduce KPR model that is the mixture of nonparametric regression models with bootstrap algorithm, which is considered in a heterogeneous and unstructured framewo...

متن کامل

Turning Quantitative: An Analytic Scale to Do Critical Discourse Analysis

Critical Discourse Analysis (CDA) could be seen as a theory in qualitative more than in qualitative stud- ies. This might have led to difficulty in doing CDA. Accordingly, this study attempted to develop a quan- titative profile in the form of an analytic rubric. For this purpose, Fairclough’s model of CDA was select- ed as the research framework. The techniques used for structuring analy...

متن کامل

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

The management of unstructured data is recognized as one of the major unsolved problems in the information industry and data mining paradigm. Unstructured data in computerized information that either does not have a data model and there are not easily usable by data mining. This paper proposes a solution to this problem by managing unstructured data in to structured data using legacy system and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014